超越基础搜索：解决语义相似性的局限性

超越相似性

当“80%问题”基础语义搜索在简单查询时表现良好，但在边缘情况却会失效。仅通过相似性进行搜索时，向量存储通常返回数值上最接近的片段。然而，如果这些片段几乎完全相同，大语言模型（LLM）将接收到冗余信息，浪费有限的上下文窗口，并错失更广阔的视角。

高级检索支柱

最大边际相关性（MMR）：与仅选择最相似的项目不同，MMR 在相关性与多样性之间取得平衡，以避免重复。 $MMR = \text{argmax}_{d \in R \setminus S} [\lambda \cdot \text{sim}(d, q) - (1 - \lambda) \cdot \max_{s \in S} \text{sim}(d, s)]$
自查询：利用大语言模型（LLM）将自然语言转化为结构化元数据过滤器（例如按“第3讲”或“来源：PDF”筛选）。
上下文压缩：压缩检索到的文档，仅提取与查询相关的“高价值”片段，从而节省令牌。

冗余陷阱

向大语言模型提供同一段落的三个版本并不会让它变得更智能——只会让提示词变得更昂贵。多样性是实现“高价值”上下文的关键。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Knowledge Check

You want your system to answer "What did the instructor say about probability in the third lecture?" specifically. Which tool allows the LLM to automatically apply a filter for { "source": "lecture3.pdf" }?

ConversationBufferMemory

Self-Querying Retriever

Contextual Compression

MapReduce Chain

Challenge: The Token Limit Dilemma

Apply advanced retrieval strategies to solve a real-world constraint.

You are building a RAG system for a legal firm. The documents retrieved are 50 pages long, but only 2 sentences per page are actually relevant to the user's specific query. The standard "Stuff" chain is throwing an OutOfTokens error because the context window is overflowing with irrelevant text.

Step 1

Identify the core problem and select the appropriate advanced retrieval tool to solve it without losing specific nuances.

Problem: The context window limit is being exceeded by "low-nutrient" text surrounding the relevant facts.

Tool Selection:ContextualCompressionRetriever

Step 2

What specific component must you use in conjunction with this retriever to "squeeze" the documents?

Solution: Use an LLMChainExtractor as the base for your compressor. This will process the retrieved documents and extract only the snippets relevant to the query, passing a much smaller, highly concentrated context to the final prompt.